Common Kubernetes Deployment Issues and How to Fix
This document refers to the most common Kubernetes deployment problems, their likely causes, and steps to fix them.
1. ImagePullBackOff / ErrImagePull
Problem:
The pod cannot pull the container image.
Possible Causes:
- Incorrect image name or tag
- Image is hosted in a private registry without authentication
- Rate limiting on public registries (e.g., DockerHub)
How to Fix:
- Check the image name and tag using
kubectl describe pod <pod-name>
- If using a private registry, create and apply an imagePullSecret:
kubectl create secret docker-registry myregistrykey \
--docker-username=<user> \
--docker-password=<password> \
--docker-server=<registry>
Add to your pod or deployment:
imagePullSecrets:
- name: myregistrykey
2. CrashLoopBackOff
Problem:
The container is repeatedly crashing and restarting.
Possible Causes:
- The application inside the container is exiting unexpectedly
- Invalid configuration or environment variables
- Failing liveness or readiness probes
How to Fix:
- Retrieve logs with:
k ubectl logs <pod-name> --previous
- Verify entrypoint, startup scripts, and environment variables
- Use
initContainers
if dependencies need to be initialized first
3. Pending Pods
Problem:
- Pods remain in a Pending state and are not scheduled.
Possible Causes:
- Requested resources (CPU, memory, GPU) exceed what’s available
- Taints on nodes prevent scheduling
- Affinity or anti-affinity rules cannot be satisfied
How to Fix:
- Describe the pod:
kubectl describe pod <pod-name>
- Adjust resource requests/limits or scale your cluster
- Add tolerations to your pod spec if needed:
tolerations:
- key: "example"
operator: "Exists"
effect: "NoSchedule"
4. Service Not Reachable
Problem:
A pod or service cannot connect to another service using its DNS name.
Possible Causes:
- Incorrect service selectors
- No matching pods (endpoints list is empty)
- DNS resolution issues
How to Fix:
- Check the service:
kubectl describe svc <svc-name>
- Make sure the pods have labels that match the service selector
- Verify the service has active endpoints:
kubectl get endpoints <svc-name>
5. ConfigMap or Secret Not Found
Problem:
Pod startup fails due to missing ConfigMap or Secret.
Possible Causes:
- The ConfigMap or Secret doesn't exist
- Typo in the resource name
- Resource exists in a different namespace
How to Fix:
- Ensure the ConfigMap or Secret is created in the correct namespace
- Validate the names and keys used in the pod spec
6. Liveness or Readiness Probe Failures
Problem:
Probes fail, causing the pod to restart or remain unready.
Possible Causes:
- Application takes time to become ready
- Incorrect path or port specified
- Probes are too aggressive
How to Fix:
- Add
initialDelaySeconds
and adjust probe intervals:initialDelaySeconds: 10
periodSeconds: 5 - Validate the health endpoint inside the container is working
7. Deployment Not Updating
Problem:
Changes to the deployment do not result in new pods being created.
Possible Causes:
- No actual changes in the pod template
- Deployment rollout is paused
How to Fix:
- Ensure the pod spec changes (e.g., image tag, env var)
- Force a rollout if needed:
kubectl rollout restart deployment <deployment-name>
8. PVC Pending
Problem:
PersistentVolumeClaim (PVC) remains in Pending state.
Possible Causes:
- No available PersistentVolume (PV)
- StorageClass does not exist or does not match
- Mismatch in requested size or access mode
How to Fix:
- Check available PVs and StorageClasses:
kubectl rollout restart deployment <deployment-name>
- Ensure a PV with matching size, access mode, and storage class exists
9. Node Pressure (Disk/CPU/Memory)
Problem:
Pods are evicted or not scheduled due to node resource constraints.
Possible Causes:
- Node is under resource pressure
- Kubelet evicts pods when thresholds are breached
How to Fix:
- Inspect node status:
kubectl describe node <node-name>
- Free up resources or reschedule workloads
- Tune kubelet eviction settings if necessary
10. RBAC Permission Denied
Problem:
Service account or user is denied permission for an action.
Possible Causes:
- Missing Role or ClusterRole
- No RoleBinding or ClusterRoleBinding assigned
How to Fix:
- Test access using:
kubectl auth can-i get pods --as=system:serviceaccount:<namespace>:<service-account>
- Apply the appropriate Role or ClusterRole and bind it to the service account